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ABSTRACT 

The purpose of this study was to examine the 
reliability and validity of a basal reading series mastery test. 
Subjects were 47 fifth graders, who were tested on the SRA Reading 
Achievement Test, the Ginn 720 End-pf-level 11 Mastery Test (MT) , and 
the X*ord Reading Test. A subgroup of 22 children was tested a second 
time on the MT. Traditional psychometric correlational analyses as 
well as strategies specifically designed for examining the adequacy 
of criterion-ref er:,enced tests were applied to the data to investigate 
the following dimensions of the technical adequacy of the MT: (1) 
consistency of student performance across tv/o administrations of the 
MT, and (2) criterion-related validity of the MT scores with respect ^ 
€o two other measures of reading proficiency, and criterion-related 
validity of the MT mastery/nonmastery decisions with respect to^ 
pre/post instructional status, Results indicated that the reliability 
and validity was acceptable for the composite test scores, ^but 
variable for the. subtests . Implications for the development and use 
of criterion-referenced tests are discussed. (Author) 
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Abstract 

The purpose of this study was to examine the reliability and 
validity of a basal reading series mastery test. Subjects were 47 
fifth graders, who were tested on the SRA Reading Achievement Test, 
the Ginn 720 End-of-level 11 Mastery Test (MT), and the Word Reading 
Test. A subgroup of 22 children was.tested a second time on ihe MT. 
Traditional psychom'et^lc correlational analyses as well as strategies 
specifically designed >^r examining the' adequacy of criterion- 
referenced tests were appli^to the data to investigate the following 
dimensions of the technical ad^ty acy of thjjg ;— (a) consistency of 
student performance across two administrations of the MT, and (b) 
criterion-related validity of the MT scores with respect to two other 
measures of reading proficiency and criterion-related validity of the 
MT mastery/nonmastery decisions with respect to pre/post instructional 
status. Results indicated that the reliability and validity was 
acceptable for the composite test scores, but variable for the 
subtests. Implications for the development and use of criterion- 
referenced tests are discussed. 
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Technical Adequacy of Basal Readers' Mastery Tests: 
The Ginn 720 Series 
Two measurement formats currently are widely used in educational 
settings for evaluating student progress and the effects of 
instructional programs. The first approach is based on the long- 
standing tradition of administering published, norm-referenced tests 
on a pre and post basis. ..Frequently, the tests themselves demonstrate 
strong psychometric characteristics such as reliability, criterion- 
related validity, and appropriate norms. ' Nevertheless, this 
traditional assessment practice has been criticized severely for a 
number of reasons, including the assessment of global versus specific 
skills and lack of content validity. Additionally, traditional 
procedures are plagued with problems of gain scores, regression toward 
the mean, and the poor reliability of difference scores. In an 
attempt to ameliorate many of these problems, particularly those of 
content validity, the . second approach, criterion-referenced iQR) 
instruments, has been developed to determine a student's mastery of 
specific curricula. Ideally, these tests measure exactly what has 
been taught. Despite, the strong content validity of many CR tests, 
there is scant research addressing the reliability and criterion- 
related validity of' these instruments. Therefore, neither available 
format can be- used with certainty to assess a student's progress or 
mastery. 

In response to this dilemma, researchers recently have begun the 
task of investigating the -validity and reliability of available 'CR 
tests. Tindal, Shinn, Fuchs, Fuchs, Deno, and Germann (1983) examined 
a typical mastery test from the Houghton-Mifflin reading series and 



found that both Its test-retest reliability and its criterion-related 
validity were less than adequate for the decoding and comprehension 
test scales. This finding documents the notion that content validity 
is a necessary, . but insufficient aspect of criterion-referenced test 
adequacy, and it underscores the Importance of investigating the 
reliability and validity of .each criterion-referenced test. 

The purpose of the current study was to extend the work of Tindal 
et al. (1983) by examining the reliability and validity of another 
basal series mastery test, that of Ginn 720. In doing so, the present 
study sought to provide information of interest not only to consumers 
of this specific measure but also to users of other CR tests for which 
technical data also are still unavailable. 

Method 

Subjects 

Subjects were 47 students (27 M, 20 F) from two fifth grade 
classes. Each class represented a school district within a rural 
midwestern cooperative. The students' mean reading percentile rank 
was 45.1 (SO = 27.8) as measured on the Science Research Associates 
(SRA). Reading Achievement Test. Only those students for whom there 
were no missing data were included in any given analysis. 

o 

Measures 

Three rneasufss of reading performance were used in the study: a 
basal series criterion-referenced test, a global norm-referenced test, 
and a curriculum-based word reading test. 

Criterion-referenced test . ' Four scales of the End-of -Level 11 
Mastery Test (-MT; Clymer, Blanton, Johnson, & Lapp, 1980) of the Ginn 
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720 and Ginn 720 Rainbow Edition reading series were employed as 
measures. Each of the four scales. Comprehension, Vocabulary, 
Decoding, and Study Skills, is comprised of subtests. Table 1 lists 
the subtests constituting each scale and provides brief descriptions 
of tasks the examinee is required to do within each^subtest. This MT 
Is criterion-referenced, with items per subtest ranging from 6 to 25 
and with mastery-nonmastery cutoff scores established at 7956 to 8656 
correct responses. 



Insert Table 1 about here 
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Norm-referenced test . The Science Research Associates (SRA) 
Read i ng Ach i evemen t Test (Naslund, Thorpe, & Lefever, 1978) is 
comprised of two subtests: vocabulary and comprehension. In the 
vocabulary section, examinees are required' to select, from four 
alternatives, a synonym for an underlined word in a sentence. In the 
comprehension section, examinees read 200-300 word passages and answer 
questions in a multiple choice format. Total test score is based on a 
linear combination* of the two subtests. Internal consistency 

'5. 

reliability was reported at ..88 (Salvia & Ysseldyke, 1981). 

Curriculum-based word reading test . The Word Reading Test (Dend, 
Mirkin, & Chiang, 1982) requires children to read aloud passages and 
isolated word lists and is scored .in terms of average numbers of words 
tforrect and incorrect over two alternate forms of the Isolated- Word- 
Reading and Passage Reading Scales. The 200-word passages are drawn 
randomly from a student's grade appropriate basal reading book; the 
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ISOrWord lists sample words randomly from the basals, with 605^ of 
words\drawn from the student's grade appropriate level and ^0% sampled 
equallyXfrom' all previous levels. For the passage and Isolated Word 
Reading itest, test-retest and alternate form reliabilities were at 
least .90 uuchs, Deno, & Marston, In press; Fuchs, Wesson, Tindal, 
Mirkin, & Dend. 1981). 
Procedure \ ' ^ 

All students\were tested In groups by a school psychologist on 
the SRA Reading AcHvlevement Test, and by their classroom teachers on 
the MT. The Word\Read1ng Test was administered Individually by 
trained aides. Standardized administration procedures were adhered to 
on all tests. TestTng\ime ranged from 60 to 90 minutes for the SRA 
Test, 60 to 90 minutes ^or the MT, and five to six minutes for the 
Word Reading Test. 

To assess test-retest Veliability questions, a subgroup of 22 

d\iini 

following order within a 2-wee^ time period: the MT, the SRA Reading 
Achievement Test, the Word Reading Test, and the MT again. For the 
remaining 20 students, each measure was given one time within a 3 week 
period, with the order of administration random. 
Data Analysis 

Consistency of performance on \two administrations of the same 
test. Consistency of students' performance on the MT was assessed in 



students (12 M, 10 F) was adfninistered the following measures in the 



three ways. In all three analyses, the students who had been tested 
twice on the MT (N=22) were the subjects. First, traditional test- 
retest reliability was determined by correlating scores from the two 
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administrations of the MT. The other two analysis strategies were 
designed specifically for criterion-referenced measures (see Mlllman, 
1974). In the fiVst of these, consistency of students/ subtest scores 
was determined by (a) computing Individuals' percentage correct score 
on each subtest for each administration of the MT, (b) calculating for 
each Individual his/her difference score across ^ the two 
administrations of each subtest, and (c) determining the percentages 
of examinees having each possible difference score on each subtest. 
In the second strategy, cons^lstency of mastery-nonmastery decisions, on 
subtests was determined by dividing the difference between observed 
and chance proportions of agreements In decisions by the maximum value 
that difference could assume. (The chance proportion of agreements 
was computed by multiplying and then summing the marginal proportions 
of the same decision categories for the two administrations, as done 
1n a chi-jquare tes,t of association.) 

Criterion vaT'idity . The criterion validity of the MT was 
-^determined in two ways, employing the entire group of subjects. (N=47). 
The traditional psychometric strategy of correlating scores on the 
measure of interest (MT) with criterion measures was used. ^ The SRA 
Reading Achievement Test and the Word Reading Test were employed as 
the criterion measures. Additionally, chi-square statistical tests 
were applied to contingency tables wherein mastery-nonmastery 
represented one dimension i of each table and pre-post instructional 
status represented the other dimension. Percentages of 
misclassif ications- and phi coefficients supplemented the chi-square 
tests. 



Results 

Table. 2 Is a display of students* mean scores and standard 
deviations on each subtest of the MT, on the subscale'and total scores 
of the SRA Reading Achievement Test, and on the Isolated word reading 
and passage reading scales of the Word Reading Test. 



Insert Table 2 about here 
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Consistency of Performance on Administrations of the Same Test 

Test-retest reliability correlations on subtests of the MT are. 
•displayed in Table 3. For the comprehension subtests, correlations 
were moderate, ranging from .74 to .86; for the , vocabulary and 
decod,1ng subtests, correlations were high, ranging from .84 to .91; 
for the study skills subtests, correlations were low to moderate, 
ranging between .49 and .64. All coefficients for the test scales and 
total test were at least .90 with the exception of the Study Skills 
Scale, which had a coefficient of .69. 



Insert Table 3 about here 



The secop.d analysis of the consistency of performance involved 
calculating the percentages of examinees who had different percentage 
correct scores across the two administrations of the MT. Figures 1-2 
are graphic displays of the percentages of examinees displaying 
various difference scores on each subtest of the MT; Table 4 

A 

summarizes the information illustrated on the graphs. The range of 
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difference scores on the subtests fell between 0 and 57%. The 
percentage of examinees with 0% difference scores on two 
administrations ranged from 18 on the Inferential comprehension 
subtest to 73 on the pref 1 xes subtest . Across the comprehensi on 
subtests, the mean percentage of examinees with 0 difference scores 
was 22.50 (SD » 6.36); across the vocabulary subtests; the mean 
percentage was 41.00 (SD « 19.79); across the decoding subtests, the 
mean percentage was 59.00 (SD » 19.79); across the study skills 
subtests, the mean percentage was 31.50 (SD = 6.36); and across all 
the subtests, the mean percentage was 38.50 (SD « 18.24). 



Insert Figures 1-2 and Table 4 about here 



The third analysis of the consistency of performance addressed 
consistency of mastery-nonmastery decisions across the two 
administrations of the MT. Table 5 is a display of the uncorrected 
and corrected proportions of examinees placed into the same decision 
category on the two administrations. 5 On the comprehension, 
vocabulary, and decoding subtests, the corrected proportions were 
high, ranging from a proportion of agreement on the word meaning 
subtest of 55% higher tfwrf) chance to a proportion of agreement on the 
prefixes subtest of 100% greater than chance. On the study skills 
subtests, the prpportion of agreement was low for the respellings and 
accents subtest (29% greater than chance), but higher for the parts of 
an outline subtest [61% greater than chance). 



Insert Table 5 about here 

Criterion Validity 

• Correlational analyses were conducted between the MT subtestsand 
two criterion measures, the SRA Reading Achievement Test and the Word 
Reading . Test. Correlations between the MT subtests and the SRA 
Subscale and Total Test scores are displayed in Table 6. They ranged 
from .48 to .78 (SD=.09) when SRA vocabulary subscale scores were 
involved; from .36 to .73 (SD=.10) when SRA comprehension subscale 
scores , were employed; and from .30 to .55 (SD=.07) when SRA Total 
Scores were used. The mean correlation for MT comprehension subfestV 
was .55 (SD=.ll); for the vocabulary subtests, .55 t(,SD=.12); for 
decoding subtests, .47 (SD=.13); and for study skii;i¥^&tesrs~r^50 
(S0=.O8). - 

■ y . ■ 

Insert Table 6 about here 

Correlations between the MT subtests and the Word Reading Test 
subscale scores are displayed in Table 7. They ranged from .31 to .82 
when isolated word reading scores were involved, and from .33 to .85 
when passage reading scores were employed. The mean corr^elation for 
'tVi)B MT comprehension subtests was .56 (SD=.18); for the MT vbcabularj^ 
subtests, .77 (SD=.08); for the MT decoding subtests, .56 (SD= .14); 
and for the MT study skills subtests, .44 (SD - 
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Insert Table 7 about here 

Criterion validity also was examined by inspecting the relation 
between mastery-nonmastery decisions on the MT and actual pre-post 
instructional status. Relevant chi-square values, phi coefficients, 
and percentages of misclassif ied students are displayed in Table 8. 
Although all chi-squares were significant, ' the level of 
misclassif ication and relationship between actual reading level and 
criterion performance showed more modest effects. Across the 
comprehension subtests of the MT, the average percentage of 
misclassified students was 15.0 (SD=9.90); across the vocabulary 
subtests, 20.0 (SD=6.36); across the decoding subtests, 37. Q 
(SD=6>36-)-;^cri^-Jibfi_study,_s^^ 22.0 (SD=9.90); and across all the 
subtests, 21,0 (SD=7.23). The phi coefficient describing the 
relations^hip between reading level and criterion performance, was 
generally quite moderate, ranging from .29 to .77, with a median 
relationship of .68 for the subtest composites. 



Insert Table 8 about here 



Discussion 

The purpose of the current study was to describe the reliability 
and criterion-related validity of a typical basal reading series, 
criterion-referenced mastery test. The study examined two aspects of 
the technical adequacy of the Ginn 720 End-of-level 11 Mastery Test: 
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(a) the consistency of. students' performance on two administrations of 
the test, and (b) the criterion validity of the test with respect to 
two other measures of reading proficiency that have demonstrated 
psychometric strength. On these indices, the Ginn 720 MT total score 
seemed adequate. Some subtests appeared adequate; others did not. 

Test-retest reliability coefficients indicated that, when the MT 
was administered twice within a short time interval, students' 
performance was somewhat inconsistent on the study skills subtests and 
on the literal comprehension subtest; none of ,the "correlations 
obtained for these subtests even fell within the acceptable range for 
making group decisions (Salvia & Ysseldyke, 1981). Nevertheless, for 
the remaining comprehension subtest, for all of the vocabulary and 
decoding subtests, and for the total score and all scales except study 
skills, correlations were high and fell into the acceptable range for 
TrKTiTi^ualndecisio '- ; j_ • 

The pattern of results^of this traditional correlational analysis 
of consistency of student performance across testings was corroborated 
wi th the cr i ter i on -referenced strategy of exami n i ng t he proporti ons of 
examinees consistently classified into the same decision category. As 
with the correlational analyses, statistics were generally high. All 
corrected proportions fell above 80^ better than chance agreement 
except for the respel lings and accents, the parts of^ an outline, the 
word meaning, and the literal 'comprehension subtests; only the 
corrected proportion for the respellings and accents subtest fell 
below 5556 better than chance. 

Inspection of' the consistency of test scores displayed in Figure 
1 and 2 and in Table ^ reveals that the percentages of examinees 



-scoring the same across two administrations of the MT were variable. 
The average percentage of subjects obtaining the same score across all 
subtests was 38.5, with the percentages for the decoding and 
vocabulary subtests relatively high and with percentages for the 
comprehension and study skills, subtests comparatively low. Given the 
fact that there are relatively few items in many subtests and given a' 
mastery criterion of 79% to 86% per subtest, one might expect a 
difference of one or two items correct in an administration of an MT 
subtest to result in different mastery decisions. ■ Interestingly, 
however, there appears to be no relation between the numbers of items 
in subtests and either the proportions of examinees placed into the 
v^-^fi decision category on the two administrations or the test-retest 
reliability coefficients. 

Therefore, although the results displayed in Figures 1 and 2 and 
in Table 4 appear variable and somewhat low for" both comprehension 
subtests, the other consistericy~analyses general ly support — tfre 
adequacy of the Ginn 720 End-of-level 11 Mastery Test, with the 
exception of the study skills subtests. These findings are contrary 
to those of Tindal et al. (1983) who examined the End-of-leveT 11 
Houghton-Mifflin Basic Reading Test and found that (a) the 
reliabilities of the study skills subtests were higher than those of 
the decoding and comprehension subtests, and (b) the results of the 
test-retest correlations and the corrected proportions of subjects 
placed into the same decision categories were lower than the results 
of the analysis involving the percentages, of examinees who had 
different percentage correct scores across the two testings. S.ince, 
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in both studies, the analysis of the consistency of mastery/nonmastery 
decisions was less variable and agreed better with the tesl-retest 
correlations than dtd-the analysis of the consistency of test scores » 
one might conclude tentatively that- the consistency of decisions 
analysis is a more useful, and perhaps more valid, strategy for 
examining the reliability of criterion-referenced tests. 

The criterion validity of the MT also was examined -in this study. 
The traditional correlational analyses indicated that the criterion 
val i dity of the' MT with r es^pe'ct" to the" "SRA Readi ng AchTevement Test 
was marginal, with only 15% of correlations between, the MT and the SRA 
Scales falling above .70 and none at or above .80; Correlations 
between the MT and the Word Reading Test scales were generally higher, 
with 3855 falling above .70 and 23% at or above .80. Given that both 
the Word Reading Test and the MT are curriculum-based-, one might 
expect a stronger relation between these two measures than between the 
MT and "the SRAReading Achievement Test. Nevertheless, correlations 

I ■ 1. ] _ j_ 

among curriculum-based measures and more global indices have been 
reported frequently at high levels (Fuchs et al., in press; Fuchs; 
Fuchs, & Deno, 1982). This indicates that the correlations between 
the mastery and SRA tests are comparatively low, and that performance 
on the MT predicts concurrent performance on more global measures of 
reading proficiency relatively poorly. 

The criterion validity of the MT also was investigated with the 
criterionrreferenced strategy of examining the relation between the 
mastery-nonmastery classification on the MT and actual pre-post 
instructional status. Percentages of misclassif ications ranged from 
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155I5 to 46% on the subtests, with 21% of students misclassified on the 
total test. These figures suggest that, for classifying students into 
groups for instruction within the basal reader for which the MT was 
designed, the MT subtests have limited /uti lity whereas the total test 
score is more valid. - 

Consequently, the current study suggests that the End-of -level. 11 
MT varied in quality.' For predicting global reading proficiency, the 
usefulness of the MT appeared limited. \ However, for making decisions 
about student placement and progress w^ithin the curriculum, results 
were more favorable. , Although in ^ most analyses the study skills 
subtests appeared inadequate and the comprehension subtests were of 
variable, quality, the decoding and Vocabulary subtests were 
acceptable, and the total MT score was generally reliable and valid. 
This indicates that (a) educators should use the MT judiciously, 
relying primarily on the decoding subtests, vocabulary subtests, and 
total te st scores for making decisions about mastery in the 
curriculum, and (b) test developers at Ginn and Co. might consider 
reexamining the study skills and comprehension subtests. 

In any case, the technical adequacy of the End-of-level 11 Gjnn 
MT was • superior to that of the End-of-level 11 Houghton-Mifflin Basic 
Reading Test (Tindal et al., 1983). Interestingly, compared to 
Houghton-Mifflin (Wallis, 1983), Ginn and Co. (Walker, 1981) .described 
preliminary examination of the quality of their mastery tests more 
thoroughly and appropriately. Perhaps, Ginn's somewhat- more 
deliberate and empirical approach to test development at least 

partially explains the better reliability and validity coefficients of 

I ' ■ '■ 

■ 18 - 
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their MT documented in this study. If so, this underscores two 

points: (a) content and face validity bre necessary but insufficient 

dimensions of criterion-referenced test adequacy, a nd (b) test 

consumers must demand^ empirical validation of criterion-referenced 

■ - - - . 

tests before relying on^^such- test data- for—making- instructionaK 

.. ■ " - ■■ .... , ■ • 

decisions. 
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'Table 1 

Examinees' Tasks on the Ginn 720 End-of-Level 11 Mastery Test 



ScrVe/^Subtest- 



-Exanvi n ea s— Tasks- 



Compre hens ion 

Literal Comprehension 



Inferential Comprehension 



Vocabulary 

Word Meaning ' 

Context Clues 

Decoding 

Prefixes 



Suffixes . 

Study Skills 

Respel lings and Accents 

Parts of an Outline 



Read a factual article comprised of five - 
paragraphs. Then, answer each of 10 
questions by selecting the correct 
response from an array of three choices. 

Read a selection. Then, answer a set of 
questions by selecting, for each question, 
a correct response from an array ofi three ' 
choices. / 



Given a word or. phrase, identify a 
synonym from an array of three choices. 

Given a sentence with an underlined word, 
select a synonomous word or phrase from 
an array of three choices. 



Given a sentence with one word omitted 
with a blank space, select a word that 
best completes the sentence from an 
~array-of— three-cholces. 



Given a sentence with one word omitted 
with a blank spacer • sel ect a word that 
best completes the sentence from an 
array of three choices. 



Read a sentence containing one underlined 
word. Given two respel lings with pro- 
nunciations and accents for the under- 
lined word, select the correct respel ling. 

Read a four paragraph article. Then, 
given a partially compl eted out! ine, ^ 
select, from an array- of three choices, 
a word or phrase to complete correctly 
each omission from the outline. 



Table 2 

Student Performance on Measures of Reading Achievement (N=42) 
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-Test- 



Mean- 



-SD- 



End-of-level 11 Mastery Test 
Comprehension Subtests 

Literal Comprehension 

Inferential Comprehension 
Vocabulary Subtests 

Word Meaning 

Context Clues 
Decoding Subtests 

Prefixes 

Suffixes \ 

Study Skills \ 

\ 

Respellings and Accents 
Parts of an Outline 



Total~Test- 



\ 



SRA Reading Achievement. Test 

\ 

Vocabulary ' 
Comprehension 
Total 
Word Reading Test 
Isolated Word Reading 
Passage Reading 



• 24.6- 
6.8 
17.9 
25.2 
17.9 
7.3 
23.7 
4.8 
18.9 
9.1 
5.2 
3.9 
—82.6- 

28.2 
30.9 
' 56.2 

50.1 
109.0 



6.0 
2.3 
4.6 
7.6 
5.6 
2.3 
6.2' 

1.7 
4.8 
2.2 
0.8 
1.8 
1973 

7.8 
12.2 
20.0 

24.7 
37.5 



22 
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Table 3 



Subtest Reliability 

Comprehension Subtests 

Literal Comprehension .74 

Inferential Comprehension .86 

Vocabulary Subtests .97 

Word Meaning / / ^ .91 

Context Clues , / / .90 

• • ■ It 

/ j . 

Decoding Subtests / • i .90 

i 

Prefixes. 1 . .90 . 

Suffixes .84 

Study Skills Subtests .69 

Respel lings and Accents .49 

Parts of an Outline .64 

Total l"est .97 
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Table 4 

Proportion of Subjects with Var^ng Percentages of lyfference- Scores 
Across Two Administrations of the End-of-level 11 
Mastery Test (N=22). 



Percentage Difference Score ^ 

0 .08 .15 .25 .35 .45 .55 .65 .75 .85 

to to to to , to to to to to to 

Mastery Test .07 .14 .24 .34 .44 .54 .64 .74 .84 1.00 



Comprehension Subtests 












Literal Comprehension' 


10 


.27 


.32 


.23 


.14 .05 

\ 

-.10 \ 

\ 


Inferential Comprehension 


25 


.50 


.33 


.10 


Vocabulary Subtests 










, \ '■ 
' . ^ 0 


Word Meaning 


25 


.59 


-.33 


.10 




Context Clues 


10 


.55 


.32 


.14 




Decoding Subtests 










\ 


Prefixes 


6 


.73 


o' 


.27 




Suffixes 


24 


.50 


.42 


.05 


.05 


Study Skills 












Respelllngs and Accents 


7 


.36 


.59 


0 


.05 


Parts of an Outline 


7 


.27 


.41 


0 


.18 .05 0 . .05 



^Number of items on the subtest. 



20 



Table 5 

Uncorrected and Corrected Proportions of Examinees (N=22) Placed 
Into the Same Decision Categories on Two Administrations 
of the End-of-level 11 Maistery Test 



Subtest 



Proportion of Examinees 

Corrected for ghance 
Uncorrected Agreements 



Comprehension Subtests 

Literal Comprehension 
. Inferential Comprehension 
Vocabulary Subtests 

Word Meaning 

Context Clues 
Decoding Subtests 

Prefixes 

Suffixes 
Study Skills Subtests 

Respel Tings and Accents 

Parts of an Outline 



.86 
.95 

.77 
.91 

1.00 
.95 

.68 
.82 



.71 
.91 

.55 
.81 

1.00 
.88 

.29 
.61 



*pbserved-Chance Proportions/Maximum Value that (Observed-Chance 
Proportions) Can Assume. 
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Table 6 , 
Correlations Between End-of-level 11 Mastery Test and SRA 
Test Scores (N=42) 



Ginn Subtest 




SRA 




Vocabulary 


Comprehension 


Total 


Comprehension Subtests 


.70 


.73 


.53 


Literal Comprehens^ion 


.52 


.58 


• .38 , 


Inferential Comprehension 


.65 


.66 


.50 


Vocabulary Subtests . 


.70 


.64 


.47 


Word Meaning 


.73 


.63 


'8 ■ 


Context Clues 


.52 


~ .57 




Decoding Subtests 


.65 


.51 


.44 


Prefixes 


■> .48 . 


.36 


.30 


Suffixes - \ 


' .66 


.52 


.47 


Study Skills Subtests 


•69 


.64 


.48 

* 


Respell ings and Accents 


.54 


.48 


.41 


Parts of an Outl ine 


.60 


' ■.55 ., 


.39 


Total Test 


.78 


.72 


.55 
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Table 7, 

Correlations Between End^of-level 11 Mastery Test and 
• Word' Reading Test Scores ..(N=42) • 



Ginn Subtests 



Word Reading Test 



Isolated Words 



Passages 



Comprehension Subtests 

Literal Comprehension 
Inferential Comprehension 

Vocabulary Subtests 
Word Meaning 
Context Clues 

Decoding Subtests 
Prefixes 
Suffixes 

Study Skills Subtests _ 



Respellings and Accents 
Parts of an oiitl ine 
Total Test 



.69 

.71 

.81 

.82 

.67 

.65 

.44 

.68 

.52 



.31 
.52 
.80 



.72 
.45 
.72 
.85 
.84 
.74 
.65 
.44 
.69 
.58 



;33 
.59 
.83 



27 
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Table 8 

Relation Between End-of-level 11 f^astery Test and 
Criterion Classification (N=46) 





Subtest 




p 




Percentage 
M1sclass1f1i£d 



'■o 



CnniDrehpn^ Ion Siihtp^t^ 


24 36 


000 

• UUU ' 


. / o 


1 u 




17 n 
■ /•II 




. u 1 


cc 


Tn'Fpv^pn'M A 1 Pnfnm^phpnclnn 


^^"^ ftfi 

i?o « ou 


• UUU 






Vocabulary Subtests 


23.90 


.000 


.72 


20 


Word Meaning 


27.40 


.000 


.77 - 


20 


Context Clues 


20.37 


.000 


.67 


33 


Decoding Subtests 


18.85 


.000 


.64 


37 


Prefixes 


12.83 


.010 


.29 


46 


' Suffixes 


2r.97 


.000 


,53 , 


35 


Study-Skill s-'Subt^sts——-—— - 




-vOlO— 


"■"153— 


■— 2"2-- 


''Respel lings and Accents 


13.77 


.010 


.55 


24 


Parts of an Outline 


9.81 


■ .040 


.46 


26 


Total Test 


24.86 


.000 


.74 


15 


Median for Subtests 


21.38 


.000 


.68 


21 



•i ■ 

. cry. 
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Difference In percentage of 
Items correct on Literal Compre* 
henslon Subtest. 
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01 
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01 
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70 
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50 . 



40 



30 
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10 • 




.0 . 04 . 08 .12 .16 . 20 . 24 . 28 
Difference In percentage of Items 
correct on Inferential Comprehension 
Subtest. 



41 



60 



SO 



S 40 
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Difference In percentage of Items 
correct' on Word Meaning Subtest. 
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Difference In percentage of Items 
correct on Context Clues Subtest. 



Figure 1. Displays of consistency of test scores on comprehension and 
vocuablaj^ subtests of end«of-level 11 HT. 
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M ■ I 1 

0 .14 .29 

Difference In percentage of Items 
correct on Respel lings and 
Accents Subtest 



l| — ^ -i ■ > 

0 .14 ,29 .43 .57 
Difference In percentage of, items 
correction Parts of an Outline 
Subtest. 



Figure 2. Displays of consistency of test scores on decoding and study 
■ r skills subtests of end-of-level 11 HT. , 
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